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IMPROVING  THE  TOOLS  OF  SYMBOLIC  LEARNING 


Yves  Kodratoff 
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Bailment  490,  Laboratoire  de  Recherche  en  Inform ahque 
Univetsitb  Paris-Sud.  UA-410  du  CNRS, 

F  -  91405  Orsay  France 


RESUME 


Dans  la  premiire  paitie  de  cet  article,  nous  donnons  quelques  consequences  du  choix  d'une  definition 
de  la  notion  de  Generalisation.  Nous  discutoos  des  relations  entre  definitions  fonddes  sur  la  deduction  et 
celles  foodees  sur  la  substitution. 

Dans  une  secoode  parne,  nous  mormons  comment  une  approche  symbolique  peur  read re  compte.  au 
morns  paraellemem.  du  brnit  present  dans  toute  donnee  reelle.  Nous  discutons  de  cate  approche  pour 
1’ Analyse  des  Seines.  1’ acquisition  de  tigles  et  de  strategies  de  contrdle.  Fonaiement.  nous  presentons 
none  idge  d’  Espace  des  Versions  Polymorphique. 


SUMMARY 


■'d 


In  its  first  part,  this  paper  presents  some  consequences  of  the  choice  of  the  definition  of  Generalization. 
It  discusses  the  definitions  based  on  deduction,  versus  those  based  on  substitution. 

In  its  second  part,  it  shows  how  symbolic  computations  are  also  able  to  take  into  account,  at  least  part¬ 
ly,  the  noise  most  real-life  data  show.  It  discusses  symbolic  approaches  to  noise  handling  in  Scene 
Analysis,  rule  learning,  strategy  learning  and,  finally,  of  the  idea  of  polymorphic  Version  Space ^ 
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INTRODUCTION 


In  ttais  paper,  we  shall  emphasize  two  aspects  of  this  part  of  symbolic  Machine  Learning  which  deals 
with  teaming  from  sets  of  several  examples,  and  the  aim  of  which  is  "moving  from  mote  specific 
descriptions  to  more  general  descriptions"  [Langley  1986]  (  called  here  generalization  ). 

Ooe  is  relative  to  the  practical  consequences  of  a  theoretical  puzzle 

Among  the  important  techniques  used,  in  ML  are  techniques  of  generalization,  and  specialists  in  ML 
build  up  systems  that  attempt  to  provide  descriptors  (  i.e.,  atomic  formulas  )  that  have  the  best  degree 
of  generalization.  For  instance,  the  Version  Space  [Mitchell  1982]  paradigm  is  a  method  that  helps  to 
find  the  exact  generalization  state  in  which  a  descriptor  must  be  used  in  order  to  optimize  the  problem 
solving  efficiency  of  operators  making  use  of  this  descriptor. 

It  is  then  somewhat  surprising  to  see  that  n^«it-nl  Logics  do  not  define  the  generalizanon  state  of  an 
atomic  formula.  The  only  existing  logical  tool  is  relative  to  disjunctive  formulas  and  is  called  subsump¬ 
tion.  while  substitution  defines  the  relative  generality  of  terms  (  Le..  formal  functional  expressions  that 
are  not  evaluated  ). 

We  chad  attempt  to  clarify  this  situation,  up  to  the  point  where  some  of  the  practical  consequences  of 
our  theoretical  choices  can  be  seen. 

In  section  1,  we  study  definitions  of  the  generalization  of  implications  and  conjunctive  formulas,  and 
their  differences,  we  study  also  the  practical  consequences  of  choosing  Modus  Ponens  instead  of  the 
Generalization  Principle  as  an  inference  rule.  An  other,  related,  topic  of  section  1  is  the  discussion  of 
the  use  of  the  properties  of  the  descriptions  one  wants  to  learn  from. 

The  other  aspect  is  :  bow  far  symbolic  methods,  as  opposed  to  numeric  ones,  must  be  of  use  ? 

In  most  of  the  present  published  wotlcs,  as  soon  as  some  noise  or  some  polymorphy  (  Le..  when  con¬ 
cepts  have  intersecting  sets  of  instances  )  has  to  be  taken  into  account,  the  authors  rash  upon  oumencai 
representations  they  claim  bong  the  only  way  to  cope  with  those  problems.  We  have  chosen  the  oppo¬ 
site  approach,  winch  is  to  stick  as  far  as  possible  to  symbolic  representations,  even  if  it  may  first  seem 
absurdly  too  far.  For  instance,  we  would  repiesetn  polymorphy  by  putting  upper  and  lower  bounds  to 
the  properties  of  concepts,  rather  than  assigning  to  a  given  instance  so  much  chances  to  belong  to  one 
concept  and  so  much  to  belong  to  an  other  one. 

Secnon  2  wifi  be  devoted  to  the  study  of  symbolic  handling  of  noise  and  polymorphy.  with,  for  in¬ 
stance,  a  presentation  of  '"Polymorphic  Version  Spaces"  which  illustrates  well  how  seemingly  purely 
symbolic  techniques  can  be  also  applied  in  a  wider  coatext 

Our  approach  aims  ax  improving  the  provability  of  each  learning  steps,  and  we  believe  that  provability 
is  a  necessary  (  if  not  sufficient  )  step  for  explicablity.  This  last  statement  is  well  illustrated  by  EBG 
[Mitchell  &  al  1986]  where  explanations  are  derived  from  proofs.  In  our  opinion,  this  point  is  of  much 
importance,  tins  is  why  we  shall  come  back  to  it  in  conclusion 


l.  -  DIFFERENT  DEFINITIONS  OF  GENERALIZATION 


1.1  .  Intuitive  Definition  of  Generalization 


There  exists  one  defimnon  which  is  agreed  upoo  by  all  authors,  the  most  inruiave  ooe.  We  give  it 
under  a  simplified  form  where  the  formulas  depend  on  one  vanable  only.  When  there  are  several  van- 


ables.  one  has  to  take  into  account  tbe  faa  that  each  variable  is  relative  to  a  given  object.  Object 
oriented  generalicanon  is  a  rather  new  topic  [Manago  1986],  we  will  not  go  into  it  because  we  would 
like  to  sock  to  well-known  concepts  in  this  section. 

Let  P(x)  and  Q<y)  be  two  formulas. 

Let  us  note  by  IPntvel  tbe  set  of  tbe  instances  of  x  such  that  ?{x)  *  TRUE,  and  similarly  for  Q. 

l/’nc/rl  •  !  *  /  P{x)  *  TRUE] 

(Qrac/e  1  *  I  y/Q(y)*TRUE) 

Then  one  says  that  P(x)  is  more  general  than  Q(y>  iff  [Price)  3  ( Qtkue\ • 

This  denmnon  is  tbe  one  actually  used  when  one  wants  to  show  that,  say,  Pfx)  is  not  mote  general 
than  Q(y).  In  that  case  it  is  enough  to  exhibit  an  mstaoce  of  x  such  that  P(x)  is  FALSE  and  Q(y)  is 
TRUE. 

The  problem,  however,  is  to  be  able  to  compute  a  generalization  from  its  instances,  and  tbe  above 
definition  gives  no  way  to  achieve  this  goal.  This  is  why  alternate  definitions,  leading  to  a  generaliza¬ 
tion  algorithm,  have  been  developed. 

1.2  •  Vere ’s  definition  of  generalization 


Let  us  fitst  consider  a  conjunction  of  descnptots.  A  formula  has  therefore  the  form 

where  each  A,-  is  a  descriptor. 

Let  (A  |  be  called  tbe  set  associated  to  A.  defined  by 

(A )  *  (Aj,  — .  A,} 

Then  A  is  more  general  than  B  iff  tbere  is 

-  an  expression  B’  such  that  (S’ }  £  (fl) 

-  a  substitution  <j  such  that  a  A  =*  S'. 

Otherwise  stated.  aA  is  equal  to  a  subpart  of  B.  up  to  a  variable  renaming. 

For  disjunctions  of  conjunctions,  tins  definition  becomes  :  Let  Gt  *  yal  v  ...  v  gm.  Gb  *  ghl  v  _  v 
then  G„  is  more  general  than  Gb  iff  Vj  3i  such  that  gv  is  more  general  than  g^ 

The  main  drawback  of  this  denmnon  is  that  it  gives  no  control  on  tbe  way  conjuncts  are  dropped  dur¬ 
ing  the  generalization  process. 

1.3  •  Existential  versos  Universal  quantification 


Tbe  sure  of  quannficanon  of  the  variables  introduced  during  tbe  generalization  process  depends  on 

1  -  tbe  form  of  tbe  expressions  given  as  example 

2  -  tbe  use  of  the  generalized  expression. 

The  form  of  the  expressions  given  as  example  depends  very  much  on  tbe  way  the  information  is 
represented. 

Consider  the  English  sentence  "That  particular  craw,  named  Jack,  is  black". 

ft  can  be  interpreted  either  as  an  implication,  or  as  a  conjunction.  Disputing  on  which  ts  the  best 
would  be  outside  of  the  scope  of  this  paper 


In  the  first  case,  its  first  order  logic  representation  will  be  : 

CROW(JACK)  =»  BLACK(JACK), 

in  the  second  case,  it  will  be  : 

CROW(JACK)  Sc  BLACKUACK). 

When  one  is  learning  from  implications  {  or.  more  generally,  from  theorems  )  the  intuitive 
behaviour  consists  in  introducing  universally  quantified  variables  [Plotlon  1970], 

From  the  knowledge 

CROWUACK)  =»  BLACKIJACK) 

CROWUOCK)  =>  BLACKtJOCK). 

one  is  tempted  to  infer 

fix  [CROW(x)  **  BLACKfxi] 

because  it  gives  a  good  representation  of  the  sentence  " All  crows  are  black? . 

When  one  is  learning  from  conjunctions,  it  is  counter-intuitive  to  introduce  universal  quantifiers. 
From  the  knowledge 

CROWUACK)  Sc  BLACKUACK) 

CROWUOCK )  St  BLACKtJOCK). 

one  is  not  tempted  to  infer 

fix  [CROW(x)  Sc  BLACK/xlj 

because  it  represents  the  sentence  "All  objects  are  black  crows "  which  is  nowhere  in  the  examples. 

Even  more  convincingly,  one  cannot  learn  that 

fixfiy  (BLACK/x)  Sc  WHTTEty)} 

from 

BLACKICROW)  Sc  WHITEtSWAN) 

BLACKUAY)  Sc  WHITE(DOVE) 

since  the  examples  contain  no  contradiction  while  fixfiy  [BLACKlx)  Sc  WHITE(y)]  does. 

Nevertheless,  it  may  seem  a  bit  awckward  to  "infer1'  from  them 

3x3y  (BLACKtx)  Sc  WHUEtyij 

since  this  existential  theorem  is  nothing  but  a  mere  logical  deduction  from  either  example. 

Suppose  that  you  start  from  a  relation  R(A.  B)  among  instances.  It  is  trivial  to  understand  that,  most 
often,  the  relatum  fixfiy  [R(x,  y)]  is  wrong.  One  has  to  find  a  relanon  of  the  type 

fixfiy  (P(x)  &  Q(y)  *•>  R(x.  y)] 

where  P  and  Q  describe  those  variables  for  winch  R  is  TRUE,  but  in  general,  one  has  no  way  to  find  P 
and  Q. 

That  explains  why  some  authors  define 

P(A)  generalizes  into  3x  Pfx)  iff  3x  [P(A)  =»  Pfx)] 

Since  tins  implicanon  is  a  tautology,  fins  defininon  is  also  very  much  disputable.  The  idea  of  generali¬ 
zation  coaveys  some  increase  in  the  intormanon  content  of  the  generalized  formula.  Here,  on  the  con¬ 
trary,  generalizanon  would  take  place,  and  seemingly  decrease  the  information  consent  of  the  general¬ 
ized  formula.  This  last  point  will  be  detailed  In  section  l. 3.2.3  below. 

Let  us  now  see  how  these  problems  are  bandied  in  each  particular  case. 

IJ.l  -  Theorem  Learning 

When  one  is  learning  from  example  theorems,  one  will  introduce  universally  quantified  variables.  This 
gives  rise  to  two  different  difficulties.  Both  of  them  are  extremely  deep  problems  and  their  answer  be¬ 
long  to  long  term  research.  Nevertheless,  we  shall  now  describe  them  briefly. 


g 
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Firstly,  there  exist  indeed  theorems  that  contain  existential  quantifiers,  and  the  recognition  of  this  ex¬ 
istential  quantifier  is  very  difficult  problem  which  amounts  to  function  synthesis. 

Secondly,  the  examples  usually  do  not  specify  what  is  the  domain  of  validity  of  the  theorem  ( i.e.,  one 
learns  usually  false  theorems  from  examples  )  and  the  determination  of  this  domain  amounts  to  predi¬ 
cate  synthesis. 

1-3 -1.1  •  Inventing  Skolem  Functions 

When  some  variables  are  existentially  quantified,  there  is  always  a  hidden  function  which  will  be  ex¬ 
tremely  difficult  to  put  into  evidence. 

Suppose  that  one  is  learning  from  set  of  examples  like  :  0  *■  1  «  1.  0  +  2  »  2 . 1  *■  0  »  l.  ....  I  +■  l 

-  2.  etc  ...  where  *  is  an  unknown  symbol.  It  would  be  wrong  to  infer  formulae  all  the  variables  of 
which  are  universally  quantified  like  :  VxVyVr  [x  +  y  *  :j. 

Let  us  now  suppose  that  it  has  been  possible,  say  by  using  suitable  counter-examples,  to  guess  that  one 
possible  formula  is  VfVy3r  fi  +  y*  r/. 

Obviously,  this  last  theorem,  although  true,  does  not  solve  the  learning  problem  implicitly  stated  cry  the 
above  sequence  of  examples  :  " invent  a  definition  of  a  function  +  that  fits  with  this  set  of  input-output 
examples ". 

When  a  theorem  contains  existential  quantifiers,  the  first  goal  is.  of  course,  to  recognize  winch  are  the 
variables  under  their  scope.  In  general,  as  the  example  shows,  this  is  not  the  ultimate  goal  which  is 
rather  :  "remove  those  existentially  quantified  variables  by  synthesizing  a  suitable  skolem  function  that 
fits  with  the  examples'. 

Instead  offixfiySx  [x  *■  y  »  r  ].  one  rather  wants  to  find  a  function  f  such  that  VfVy  [x  *•  y  -  fix.  y)j, 
and  f  realizes  the  operation  +. 

Several  methodologies  that  propose  an  approach  to  the  solution  of  this  problem  can  be  found  in  [Bier- 
mann  &  aL  1984).  Recently,  an  original  approach  has  been  developed  and  implemented  in  our  group 
[Franova  1985.  1986]. 

U.JL2  -  Finding  Domain  Definmoos 

Let  us  suppose  now  that  we  are  in  the  simpler  case  where  all  quantifications  are  universal  ones.  It 
does  not  mean  that  the  theorem  is  true  in  all  possible  interpretations  :  one  must  also  find  the  domain  of 
defimnon  of  the  variables. 

Suppose  that  the  system  is  to  learn  rules  concerning  the  economic  relationships  between  countries. 

For  example,  it  will  be  told  that  : 

If  France  is  a  buyer  of  video  recorder,  and  Japan  produces  them,  then  France  is  a  potential  buyer  of 
video  recorder  from  Japan. 

A  formal  way  of  representing  this  sentence  is  : 

E,  :  NEEDS!  FRANCE.  VIDEOS)  it  PRODUCES!  JAP. AN.  VIDEOS)  -->  P0SSBU7!  FRANCE.  VIDEOS, 
JAPAN). 

Assume  that  we  also  have  the  second  example  : 

£j  NEEDS!  BELGIUM,  COMPUTERS)  4  PRODUCES!  USA.  COMPUTERS)  -> 

POSSBUY! BELGIUM,  COMPUTERS.  USA). 

It  is  then  easy  to  find  the  following  generalization  : 

G  :  VxVyVu  NEEDS! x,  u)  <4  PRODUCES(y.  u>  ->  POSSBUYfx.  u.  y). 


•  i  .<  »-V  .  - 
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These  hierarchies  describe  the  possible  domains  of  the  variables,  this  information  can  be  introduced  as 
a  condition  to  the  application  of  the  rule  : 

VxVyVu  (  IF  COUNTRY(x)  &  COUNTRY(y)  &  PRODUCER-GOODS!  u) 

THEN  j  NEEDSfx.  u)  &  PRODUCESly.  u)  ■•>  POSSBUYfx.  u.y)  \  J. 

A  greater  refinement  is  of  course  possible  when  information  is  available,  more  detailed  than  the  two 
above  taxonomies  [Kodratoff  1985],  [Kodratoff  1986a]. 

U2  -  Concept  Learning  or  Different  Ways  to  Use  a  Recognition  Function 

In  ibis  section,  let  us  assume  that  we  do  not  Seam  rules  or  theorems,  but  conjunct*  of  atoms. 

This  Irinri  of  learning  aims  at  obtaining  a  formula,  called  recognition  function,  that  characterizes  the 
micro-world  to  which  the  examples  belong. 

When  quantifiers  are  introduced,  the  recognition  process  will  work  by  using  a  deduction  principle. 

In  our  example,  we  shall  use  refutation  and  write  the  recognition  as  the  deduction  in  a  PROLOG  pro¬ 
gram.  and  use  Edinburgh  notation  [Qoclcnn  St  Meilish  1981] 

Suppose  that  w  start  from 

£i  ;  ~77us  scene  contains  KOKO  which  is  a  white  swan" , 

El  :  " This  scene  contains  KJKI  which  is  a  white  swan". 

These  examples  are  interpreted  as  a  description  of  some  scene  "This  scene” . 

They  are  then  given  the  form  : 

£,'  :  SWAN/ KOKO)  Sc  WHTTEIKOKO) 

E{  :  SWAN(KIKI)  Sc  WHTTE(KIKl). 

Obviously,  one  aims  here  at  recognizing  scenes  that  contain  a  white  swan. 

This  example  has  been  chosen  on  purpose  to  be  opposed  to  the  "black  crow "  one  since  ail  swans  are 
not  white. 

1-3-2.1  -  Universal  quantification 

All  variables  are  universally  quantified,  tbe  recognition  'function''  has  therefore  the  form  :  Vx  P(x). 

It  will  be  used  as  a  recogmnoo  function  of  a  scene,  say  5,,  as  defined  : 

One  says  that 

Vx  P(i)  recognizes  5,  when  one  can  prove  Vx  P(x)  =■•  5,. 

Using  refutation  for  the  proof  of  Vx  P(x)  =>  5,  amounts  to  prove  that  us  negation  leads  to  a  contradic¬ 
tion.  Le.  that 

Vx  P(x)  &  -*S, 


-  7  - 


leads  to  a  contradiction. 

£/  and  EJ  will  generalize  to 

Vx  m)  -  Sx  [SWAN(x)  4  WHTTEtx)/ 
which  will  be  used  to  recognize  a  scene  made  of  a  white  swan. 

Suppose  that  we  want  to  check  that 

S,  -  [SWANIJACKO )  4  WHITEIJACKO)! 

is  recognized. 

One  has  to  prove  that 
SWAM. x) 

WHITE/. x) 

SWAN(JACKO).  WHITEIJACKO ) 
leads  to  a  contradiction,  which  is  of  course  the  case. 

Therefore,  the  scene  is  recognized  by  this  recognition  function. 

This  land  of  generalization  has  hie  property  (  in  some  cases  it  is  a  draw  bade,  in  some  others  it  may  be 
m  advantage  )  that  it  will  fail  to  recognize  a  scene  with  additional  details. 

Consider  a  scene  with  a  white  swan  and  a  Renault  car.  one  will  have  to  find  a  contradiction  in  the  set: 
SWAM x)  :■ 

WHITE(x) 

SWANtJACKO),  WHITEIJACKO).  CAR! RENAULT) 
and  this  will  not  be  possible. 

In  conclusion,  one  most  use  universally  quantified  variables  when  one  looks  for  a  recognition  function 
that  recognizes  whole  scenes.  One  most  not  use  them  when  the  recognition  function  is  supposed  to 
recognize  sub-parts  of  a  scene. 

1J.X2  -  Existential  quantification 

All  variables  are  existentially  quantified,  the  recognition  'function'’  has  therefore  the  form  ;  3x  Pfx). 

It  will  be  used  as  a  recognition  fuoctioa  of  a  scene,  say  S,.  as  defined  : 

One  says  that 

3x  Wx)  recognizes  Si  when  Si  I-  3x  Pfx), 

Le..  when  one  can  deduce  3x  Ptx)  from  S t. 

Using  refutanon  for  the  proof  of  5,  I-  3x  P(x)  amounts  to  prove  that  deducing  the  negation  of  3x  P(x) 
from  S(  leads  to  a  contradiction,  i.e.  that 

Si  &  -i  3x  P(x) 

leads  to  a  contradiction. 


E i'  and  ET  will  generalize  to 

3x  Ptx) 

therefore  one  has  : 


it  [SWAN! xi  4  WHlTE(x)l. 


3x  Ptx)  -  Vj  -P{x)  -  Vx  —tSWAN(x)  4  WHTTEix)j 


Suppose  that  we  wont  to  check  that 

S,  -  [SWANUACKO)  4  WHITEtJACKOil 

is  recognized. 

One  has  to  prove  that 

SWANtJACKO ) 

WHITEIJACKO)  :• 

>  SWANfx).  WHTTEtx) 

leads  to  a  contradiction,  which  is  of  course  the  case. 

Therefore,  the  scene  is  recognized  by  this  recognition  /unction. 


1333  -  Conclusion 


i  .  ^  £'  r. 


Existential  quantification  might  have  been  felt  as  counter-intuitive  because  "nothing  is  learned"  from  it. 
This  is  not  true  for  the  following  reasons. 

-  The  existential  theorem  3x  P(x)  teamed  from  a  set  of  examples  |£If  ....  £„)  must  be  deduable  from 
each  £,.  It  therefore  catches,  as  it  should  be,  some  of  the  common  features  of  the  examples. 

It  will  recognize  a  scene  by  its  sub-parts. 

Consider  again  a  scene  with  a  white  swan  and  a  Renault  car,  one  will  have  to  find  now  a  contradic¬ 
tion  in  the  set : 

SWANIJACKO I  :• 

WHITE(JACKO) 

CAR< RENAULT]  :- 

SWAN(x).  WHITElx) 

and  presence  of  a  renault  car  is  no  longer  harmful. 

-  This  definitioa  is  actually  very  near  to  the  intuitive  one.  In  particular,  it  contains  Michaisla’s  general¬ 
ization  rules  (MichaJsla  &  Stepp  1983],  [MichaisJa  1984],  For  instance,  the  example  of  section  13.2.2 
shows  how  it  contains  the  'dropping  condition"  rule. 

•  This  approach  has  been  used  in  [Kodratoff  198S]  for  the  specific  case  of  counter-examples.  It  has 
been  generalized  by  Nicolas  (Nicolas  1986a.  1986b]  who  uses  a  theorem  prover  in  order  to  perform  in¬ 
ductive  learning,  which  may  seem  surprizing  at  first  sight. 

We  have  developed  an  other  way  to  define  generalization,  by  extending  the  classical  definition  of  term 
generalization,  as  seen  in  the  next  section. 

In  order  to  prove  the  necessity  of  introducing  these  new  concepts-  let  us  consider  the  following 
counter-example  to  the  methods  issued  from  Modus  Pooens,  as  presented  in  sections  13.1  and  133. 

133  -  A  Counter-example 

Let  us  now  give  a  "counter-example"  to  deductive  definition,  in  that  sense  that  a  best  generalization  is 
not  found  by  iL 

133.1  -  A  definition  of  "best  generalization"  issued  from  Modus  Pooens 

There  is  an  obvious  way  to  define  a  best  generalizanon  when  one  quantifies  existentially  the  vanables. 
Tbe  best  generalization  is  the  one  which  is  the  "nearest"  to  all  the  examples,  but  contains  the  informa¬ 
tion  they  have  in  common. 

Let  ( £, }  be  a  set  of  examples,  and  !G;)  be  a  set  of  possible  generalizanons.  ue..  Vi.  one  must  be  able 
to  prove  that  £,  1-  Gy,  for  each  of  the  Gy'  s. 

Since  one  inters  the  generalizanons  from  the  examples,  it  is  obvious  that  one  must  define  the  best  gen¬ 
eralization  among  the  Gy's  by  being  the  mosy  specific  one.  i.e.  the  one.  if  it  exists,  from  which  ail  oth¬ 
ers  can  be  inferred. 

133.1  -  The  counter-example 

Suppose  that  one  starts  from  the  two  examples 

£,  .  ON(A,  8 )  &  SEAR/B.  C) 

£,  .  ON(D,  E) 

with  the  theorems 

VjcVv  [ON(x.  y)  =»  SEARfx.  y>] 

Yx  Vy  [NEARIx.  y)  <ss>  VEAflfv,  x)l 
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Using  these  theorems,  one  can  show  that  the  two  following  Potential  Generalizations 
G,  :  it  3y  [0N< x.  y>l 
G-i  :  3x  3y  3z  [ ON(x .  y)  <4  NEARfy.  zjj 
are  equivalent  relative  to  our  definition,  since  G  i  <ss>  Gz. 

Nevertheless,  the  associated  generalizations  obtained  try  substitution  techniques,  as  seen  below,  are  : 

/,  :  ONI. x.  y> 

f2  :  ON/x.  y)  &  NEARly.  z) 

They  are  not  equivalent  since,  using  the  theorems,  one  can  show  that  /,  is  equivalent  to 

f{  :  ONlx.  y)  4  NEARly.  x). 

fz  is  clearly  ( from  definition  IJ.2  below  )  more  general  than  /,  since  the  substitution  C  »  (jt  «—  x.  y 
<—  y,  z  t—  x\  is  such  that  O  f2  -  f{. 


L4  -  Term  generalization 


1.4.1  -  Terms 


Let  V  be  1  countable  set  of  variables  and  F  a  family  of  functions  indexed,  by  the  natural  numbers. 
When  a  function  f  belongs  to  F, ,  one  says  that  the  anty  of  f  is  n.  The  set  F0  of  functions  of  anty  zero 
is'  called  the  set  of  the  constants. 

The  set  of  terms  on  V  and  T,  is  defined  by 
(i)  v  s  V  is  a  tetm 

(d)  fif, — r„)  is  a  term  iff  /  s  F„  and  q . d,  are  terms. 

Intuitively,  the  set  of  terms  is  a  set  of  expressions  built  with  functions  of  some  anty,  constants  and 
variables. 

1.4.2  -  Generalization 

The  term  t ,  is  more  general  than  the  term  tn,  denoted  by  r,  £  fe,  iff  there  exists  a  subsnmnoo  at,=tz. 
This  dehmnon  does  aot  take  into  account  the  properties  of  the  functions.  One  describes  these  proper¬ 
ties  by  a  "theory''  e.  and  one  defines  a  generalization  modulo  this  theory. 

1.43  -  e  -  generalization 

Let  £  be  a  set  of  axioms  which  express  the  properties  of  the  functions. 

When  one  needs  to  use  these  axioms  in  order  to  recognize  the  equality  of  two  terms,  one  says  that  they 
are  e-equal 

For  instance,  the  two  terms  -  1 2  +3)  and  tz  -  (3  r  2)  are  not  considered  as  "equar  but  as  ’£- 
equal”  because  one  needs  to  use  the  axiom  of  commutativity  : 

Vx  Vy  ((x  *  y)  -  <y  +  x)]. 

m  order  to  recognize  that  r,  *t  h. 

This  definition  may  seem  counter- intui five  but  is  it  necessary  to  single  out  the  use  of  axioms  in  the 
context  of  an  automatic  generanon  of  generalizations  because  their  use  may  lead  to  infinite  computanon 
loops  (  using  the  axiom  in  one  direction  and  then  in  the  other  one  ).  This  kind  of  problems  have  been 
very  much  studied,  see  tor  instance  [Shekel  1981],  [Hsiang  1982]. 

Let  =s  denote  £-equaiity. 

A  term  t,  is  more  general  than  a  term  r.  in  the  theory  e  iff  there  exist  t, and  and  a  a 
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snch  that  at{'=tz'. 

Depending  on  e,  it  may  be  that  the  above  definition  of  e- generalization  is  not  consistent  with  its  im¬ 
plicit  future  use  for  definition  an  (  at  least  partial  )  order.  Using  some  of  the  properties  one  may  find 
and  h’  such  that  rt=tr,'  and  tfVa'  and  there  is  a  a  such  that  cr, 

Nevertheless,  it  may  well  also  be  that,  using  other  properties,  one  can  find  t and  r,"  such  that  ri=Vi" 
and  ‘rVi".  and  there  exists  a,  such  that  even  when  [Kodratoff  &  Ganasaa  1986]. 

Since  we  want  to  use  the  properties  of  the  functions,  and  further  define  the  generality  of  formulas 
(therefore  using  the  properties  of  our  connectors)  it  is  necessary  to  find  a  definition  of  e  -  generaiizanon 
that  avoids  this  difficulty. 

1.4.4  •  Example  of  e  -  generalization  (where  atomic  formulas  are  created  like  terms) 

Let  us  suppose  that  we  work  in  a  world  of  objects  which  have  a  color  and  the  the  following  knowledge 
is  available 

Vx  3y  COLOR! y,  x) 

It  states  chat  each  object  x  has  a  color  named  y.  In  addition.  RED  is  a  kind  of  COLOR  and  this  infor¬ 
mation  is  supposed  to  be  also  known.  This  knowledge  allows  us  to  transform  any  atomic  formula  like 
RED(x)  into  an  instance  of  more  general  atomic  formula  COLORfRED.  x). 

Let  us  compare  the  generality  of  the  concept  'red  square '  C{  and  square '  Cj. 

C,  =  SQUAR£(x)  &  RED(x) 

C,  *  SQUAR£(x> 

Applying  the  above  theorem,  one  knows  that  for  any  x  of  Cz  .  it  has  an  unknown  color  ,  say  y.  There¬ 
fore  Ci  is  equivalent  to  C{  =»  SQUARE(x)  &  COLOR(y,  x).  Based  on  the  fact  that  RED  is  more  pamc- 
ular  than  COLOR,  one  can  find  Cy=eCi,  C,'  »  SQUARE(x)  &  COLOR(RED.  x).  Now.  the  usual  term 
definition  of  generality  can  be  applied  since  cCV=Ct'  with  a  =*  (y  <—  RED).  Therefore  Cz  is  more  gen¬ 
eral  than  C;  in  the  theory  which  contains  the  above  information. 


1-5  -  Definition  of  Formula  Generalization  modulo  a  theory 

Let  £(  and  £,  be  two  formulas  and  z  an  equanonai  theory. 

1.5.1  -  Generalized  formula 

We  say  that  formula  £t  is  a  generaiizanon  of  formula  £-  if  Condition  1  is  fulfilled. 

Condition  1  :  there  exists  £,’  such  that  £t'  =E  £t  and  there  is  o-  such  that  a>£,’  =e  £,. 

This  coodmoa  states  that  there  exists  £,'.  equivalent  to  Ex  and  that  E{,  considered  as  a  term,  is  more 
general  than  £,  considered  as  a  terra. 

The  next  definition  gives  an  other  condition  wtnch  insures  that  formula  generality  is  a  partial  ordering. 

1.5.2  -  Generality  relation  between  two  formulas. 

We  shall  say  that  E[  is  more  general  than  E?  when  Coadihon  l  and  Condition  2  are  fulfilled.  Condi¬ 
tion  1  is  as  above  and 

Condition  2  :  For  ail  £,'  such  that  E{  =e  £, .  if  there  exists  or,,  such  that  a^Ez'  ~t  £,.  then  E{  =>t  £,. 
This  secood  coodmon  states  that  the  first  condition  can  actually  be  used  for  ordering  the  formulas. 


It  says  that  if  there  is  a  which  is  equivalent  to  E-,  and  which  is  more  general  (  as  a  term  )  than  £,. 
then  all  three  £(,  Ev  E{  must  be  equivalent. 


Some  theoretical  consequences  of  condition  2  have  been  studied  in  [Kodraioff  &  Ganascxa  1986]  under 
the  name  of  i-implication. 

We  shall  rather  explain  bow  ooe  can  make  an  algorithm  out  of  dehnihoo  1.5.2. 

One  needs  to  find  out  the  transformed  £,  and  £,,  tolled  E{  and  £,'  in  the  above  definidoa.  We  have 
called  this  work  :  Structural  Matching  [Kodratoff  1983]. 


1.6  -  Structural  Matching  (  SM  ) 


1.6.1  -  Definition 

Two  formulas  structurally  match  if  they  are  identical  except  for  the  constants  and  the  variables  that  in¬ 
stantiate  their  predicates. 

More  fotmally  : 

Let  £,  and  £,  be  two  formulas. 

£,  structurally  matches  £,  iff  there  exists  a  C  and  there  exist  a,  and  ch  such  that 

1-  <r,C=£|  and  a-C=£t . 

2-  o,  and  a-,  never  substitute  a  variable  by  a  formula  or  a  function. 

It  must  be  understood  that  SM  may  be  difficult  up  to  undeadable.  Nevertheless,  in  most  cases,  one  can 
use  the  infoimanon  coming  Gram  the  other  examples.  u>  order  to  know  how  to  orientate  the  proofs 
necessary  to  the  application  of  this  definition. 

1.6J  -  SMizing  two  formulas 

SM  may  well  fail,  whereas  the  effects  of  the  attempt  to  put  into  SM  may  soil  be  interesting. 

We  say  that  two  formula  have  been  SMized  when  every  possible  property  has  been  used  in  older  to 
put  them  into  SM. 

When  the  SM  is  a  success,  then  SMizing  is  identical  to  putting  into  SM. 

When  the  SM  is  a  failure.  SMizing  keeps  the  best  possible  result  in  the  direction  of  matching  formulas. 
1.6-3  -  A  simple  example  of  (  successful )  Structural  Matching 
Consider  the  two  following  examples. 


£:  : 


Using  his  intuition,  the  reader  may  nonce  that  he  can  find  two  different  generalizations  from  these 
examples. 

He  sees  that  either 

■  there  are  two  different  objects  touching  each  other,  and  a  small  polygon 


yi  *j* unim  Wd. Jm Wtf  WV JCV^V^m 
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-  there  are  two  different  objects  touching  each  other,  one  of  them  is  a  square. 

Both  generalizations  are  true  and  there  is  no  reason  why  one  of  them  should  be  chosen  rather  than  the 
other.  We  shc.ll  now  see  that  one  of  the  interesting  features  of  SM  is  that  it  keeps  all  the  available 
information,  and  therefore  constructs  a  formula  containing  both  the  above  two  "concepts" . 

The  examples  can  be  described  by  the  following  formulas 

£,  -  SQUARE(A)  4  CIRCLEIB)  4  ONI  A,  B)  4  SMALUAI  4  81G<B) 

E>  -  TRIANGLE! C)  4  SQUARE(D)  4  TOUCHfC,  D)  4  SMALLIC)  4  BIGiD) 

Let  us  suppose  that  the  following  hierarchy  is  provided  to  the  system. 


POLYGON 

/?\ 
SQUARE  TRIANGLE 


together  with  the  theorems 

Vx  Vy  (ON(x.  y)  **  TOUCHlx.  y)j 
Vx  Vy  [ TOUCHlx .  y)  TOUCHly.  x)j 

This  taxonomy  and  the  theorems  represent  our  semantical  knowledge  about  the  micro-world  in  which 
teaming  is  taking  place. 

The  SM  of  £,  and  £2  proceeds  by  transforming  them  into  equivalent  formulas  E{  and  Ed,  such  that 
£,'  is  equivalent  to  £,,  and  Ed  is  equivalent  to  £-  in  this  miao-worid  (  ue.,  taking  into  account  its 
semanhcs  ). 

When  the  process  is  completed,  £,'  and  E{  axe  made  of  two  parts. 

One  is  a  vanabilized  version  of  £,  and  £,.  It  is  called  the  body  of  toe  SMized  formulas.  When  SM 
succeeds,  the  bodies  of  £,'  and  £-'  are  identical. 

The  other  part,  called  the  bindings  (  of  the  variables  ),  gives  all  the  conditions  necessary  for  the  body 
of  each  £,'  to  be  identical  to  tbe  corresponding  £,. 

The  algorithm  that  constructs  £,'  and  Ed  is  explained  in  [Koiliatoff  1983.  Kodratoff  &  Ganascia  1986, 
Kodrarorf  Sc.  aL  1984],  It  has  been  implemented  several  times  under  the  name  of  AGAPE  or  MAGGY. 
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In  our  example,  it  would  find 
Body  of  £,'  - 

POLYGON/ u,  y)  4  SQUARE(x)  4  CONVEX/ v„  v„  :)  4  ON(y,  :)  Sc  TOUCHly,  z)  4  SMALUy)  4 
BIGiz) 

Bindings  of  Ef  » 

((x  •  y)  Sc  (y  *  z)  4  (x  #  z)  4  fv,  «  ELLIPSOID )  Si  fvz  -  CIRCLE )  4  fu  »  SQUARE )  4  fx  »  A)  4  (i 
-  Bit  ' 

Body  of  Ed  - 

POLYGONfu,  y)  4  SQUARE(x)  4  CONVEXlv v,.  zi  4  TOUCHly,  z)  4  SMALUy)  4  S/Gfrj 
Bindings  of  Ed  » 

ffx  *  y>  Sc  (y  *  z>  Sc  (x  *  z)  Si  fv,  »  POLYGON )  4  (v-  »  SQUARE)  Sc  (u  •  TRIANGLE )  <&  (x  •  D)  Sc 
(y  -  O) 


M 


y \  y.  -•  v  --  /  .•  ^  ,  ,  ,•  .  , 

■■  rfl  hi  rfi  staled -No-. 
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The  reader  can  check  that  £,'  and  £V  are  equivalent  to  £.  and  £.. 

£,'  j/iti  £,'  contain  exactly  the  information  extracted  from  the  hierarchy  and  the  theorems  which  is 
necessary  to  put  the  examples  into  SM. 

For  instance,  in  £,',  the  expression  '  ( POLYGON!  u,  y>  ’  means  that  there  is  a  polygon  in  £,.  and  since 
we  ttave  the  binding  (u  -  SQUARE ).  it  says  that  this  polygon  is  a  square,  which  is  redundant  in  view 
of  the  fact  that  SQUAREfx)  &  (. r-  y)  says  that  x  is  a  square  and  is  the  same  as  y.  This  redundancy  is 
not  artificial  when  one  considers  the  polygon  in  Ej  which  is  a  TRIANGLE. 

This  example  shows  well  thai.  once  this  SM  step  has  been  performed,  the  generalization  step  itself 
becomes  trivial  :  we  keep  is  the  generalization  ail  the  bindings  common  to  the  SMized  formulas  and 
drop  ail  those  not  in  common. 

In  other  words,  this  SM  technique  allows  to  reduce  the  well-known  generalization  rales  [Michalslri 
1983.  1984]  to  the  only  "dropping  condition  rale"  which  becomes  legal  on  SMized  formulas.  All  the 
induction  power  is  in  the  dropping  coodidon  rale,  all  other  rales  are  purely  deductive.  We  must  con¬ 
fess  that  formal  proof  of  the  above  statement  is  still  under  research. 

The  generalization  £(  and  E>  is  therefore 

£,  .  POLYGON!  u.  y)  4  SQUARE! x)  4  CONVEX! v,.  v,,  :>  4  TOUCHly.  :)  4  SMALUy )  4  BIGft) 
with  bindings  <y  * 

In  "English’',  this  formula  means  that  there  are  two  different  objects  (  named  v  and  :  ).  y  and  :  touch 
each  other,  y  is  a  small  polygon,  z  is  a  big  convex,  and  there  is  a  square  (  named  x  )  which  may  be 
identical  to  y  or  z. 

It  can  be  easily  guessed  that  using  theorems  can  lead  to  many  difficulties,  since  one  enters  the  realm 
of  Theorem  Proving,  which  is  well-known  for  being  a  good  source  of  yet  unsolved  problems. 

In  the  case  of  SM.  one  is  driven  by  the  need  to  put  the  examples  into  a  similar  form,  and  the  usual 
difficulties  of  Tberorem  Proving  are  somewhat  smootbered  out. 

We  cannot  formally  prove  this  point,  but  the  following  example,  taken  from  [Vtain  1986]  can  at  least 
illustrate  our  claim. 

L7  •  Using  theorems  to  improve  generalization 

Starting  from  two  examples  that  have  no  common  predicates,  we  show  that  they  nevertheless  have  a 
common  generalizanoo.  found  by  using  theorems  that  link  the  predicates. 

Let  the  examples  be 

£,  -  MAMMALIAN! A)  4  B REDMAN (MALI A) 

£>  -  TAME! B l  4  VIVIPAROUS(B) 

to  which  the  following  theorems  are  joined 

/?,;  Yr  [MAMMALIAN/. x>  4  BRED_ANlMALlx)  =*  TAMElxlj 
£,.  Vjc  [TAMEtxi  4  VTVlPAROUSlx >  =*>  MAMMALIAN! x)[ 

/?,.  V*  (TAMElx)  =»>  HARMLESS! x)[ 

The  first  step  of  SM  is  here  trivial  :  we  replace  the  constants  by  a  variable  x.  and  obtain  the  equivalent 
examples  : 


£,'  -  MAMMALIANS x)  <4  BREDJtNIMAUx)  (EQ(x.  A)] 
Et  -  TAME(x)  &  VTVIPAROUS(x)  (EQ(x.  B)J 


,  Vv-  ‘ 


Since  the  predicates  have  no  common  occurence,  we  consider  the  first  (  this  ordering  is  not  significant, 
and  just  follows  the  one  in  which  the  examples  are  given  )  predicate  of  £,'  :  MAMMALIAN.  We  see 
that  we  can  deduce  this  predicate  from  En.  using  the  rule  /?■».  We  get: 

E"  -  MAMMAUAN*(x)  <L  BRED_ANlMAL(x)  [EQ(x,  A)} 

£,"  -  TAME(x)  <4  VIVIPAROUSlx )  <4  MAMMALIAN* * (x)  (EQ(x.  B)] 

The  MAMMALIAN  of  £,'  has  been  treated,  this  why  it  is  marked  try  an  *  in  The  one  of  Ef’  is 
issued  from  the  use  of  theorems,  this  is  why  it  is  marked  by  **. 

Using  again  the  order  in  which  the  examples  are  given,  the  next  non-marked  predicate  is 
BREDJlNIMAL. 

—  No  rule  can  be  applied  to  £,"  to  make  explicit  the  presence  of  BRED _ANIMAL  in  it. 

-  Nevertheless,  we  remark  that  applying  the  rule  ft,  to  E"  uses  the  concerned  predicate 
BRED _ANIMAL.  Checking  the  effect  of  this  application,  we  see  that  it  generates  the  atomic  formula 
TAME(x)  and  that  there  is  an  occurence  of  x  in  £•>"  which  matches  this  occurence.  Therefore,  we  con¬ 
clude  that  we  most  apply  ft,  to  £,". 

One  obtains 

£,'"  -  MAMMALIAN* (x)  <4  BRED_ANIMAL*(x)  <4  TAME**(x)  { EQ(x .  A)/ 
£,'".  TAME*(xi  <4  VIVIPAROUSlx)  <4  MAMMALIAN**! x )  [EQ(x.  B)] 

Now,  the  only  un-matched  predicate  is  VIVIPAROUS  in  El". 

-  No  rules  can  be  applied  to  E{"  to  make  its  presence  explicit. 

—  The  only  rule  which  can  be  applied  in  El",  relative  to  VIVIPAROUS  is  ft,.  But.  it  would  introduce 
the  atomic  formula  MAMMAUANlx).  which  is  already  matched  since  its  instances  are  starred. 

No  other  rule  can  be  applied,  we  star  the  predicate  VIVIPAROUS  to  remember  that  it  has  already 
been  dealt  with,  obtaining  : 

£,""  -  MAMMALIAN *Ix)  <4  BRED_ANlMAL*(x )  4  TAME**(x)  [EQ(x.  A)/ 
E>"r  -  TAME*(xl  &.  VIVIPAROUS* (xi  <4  MAMMAUAN**(x)  [EQ(x.  3)/ 

All  possible  occurences  have  been  dealt  with,  a  complete  SM  is  not  possible,  therefore  the  SMizing 
operation  stops  here. 

Now.  the  generalisation  step  is  trivial  :  one  drops  the  non-common  occurences,  obtaining  the  generali¬ 
sation 

G  -  TAME(x)  <4  MAMMAUAN(x) 

This  example  shows  well  how  potential  infinite  proof  loops  can  be  easily  avoided,  simply  because  they 
do  oot  improve  the  SMizing  state  of  the  examples. 

More  generally,  ooe  can  use  theorem  proving  techniques  in  order  to  improve  the  degree  of  similarity 
detected  among  the  examples. 

Such  a  system  is  under  development  in  our  group  [Vrain  1987].  It  is  not  the  concatenation  of  a  classi¬ 
cal  theorem  prover  and  of  generalizanon  algonchms,  but  ts  rather  strongly  adapted  to  the  land  of  proofs 
requited  by  Machine  Learning. 

As  an  instance  of  its  peculiarity  (  and  of  its  incompleteness  ).  it  will  not  allow  to  use  twice  the  same 
theorem  dunng  a  given  den v  anon.  This  is  of  course  a  crude  way  to  avoid  infinite  loops  but.  as  the 
above  example  show,  the  corresponding  incompleteness  is  not  so  wide  as  ooe  could  fear. 
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2  -  SYMBOLIC  LEARNING  IN  A  NOISY  ENVIRONMENT 


In  tbe  presence  of  noise  or  polymotphy,  numeric  techniques  have  proven  their  usefulness.  On  this  to¬ 
pic.  we  want  to  stress  two  points. 

1  •  applying  numeric  techniques  too  soon  always  spoils  the  understandahility  of  tbe  results,  and  may 
even  hamper  tbe  efficiency. 

2  -  when  used  at  proper  dme.  tbey  become  a  wonderful  tool. 

In  other  words,  we  do  not  cnncize  tbe  use  itself  of  numeric  techniques,  but  their  too  early  use. 

The  aim  of  this  section  is  to  show  that  one  should,  and  this  is  possible,  stick  to  symbolic  techniques  as 
far  as  possible  before  beginning  to  compute  coefficients  combinations. 

This  having  been  said,  we  are  also  quite  conscious  of  the  importance  of  a  proper  combination  of  tbe 
coefficients.  We  ate  simply  a  bit  puzzled  by  the  huge  amount  of  research  done  about  coefficient  combi¬ 
nation.  and  the  tiny  one  done  to  find  when  and  where  they  must  combine.  More  details  on  this  last 
point  can  be  found  in  [Duval  &  Kodntoff  1986]  in  tbe  case  where  one  draws  inferences  from  uncertain 
or  noisy  clauses. 

Recently,  and  independandy.  Micbalska  [Micfaalsla  1986]  has  introduced  the  idea  of  "two-dered  con¬ 
cept  meaning  '  which  is  a  close  parent  of  the  ones  presented  here. 

2.1  •  Learning  recognition  functions  in  Scene  Analysis 

Scene  Analysis  is  very  typical  of  a  huge  development  of  numerical  techniques  and  of  what  we  shall 
call,  and  try  to  prove  to  be.  a  "hidden '  use  of  symbolic  method. 

2.1.1  -  Domain  Independant  Scene  Analysis 

There  is  of  course  a  need  for  methods  to  go  from  the  pixel  level  to  the  level  of  some  descriptors  (  like 
segments,  curvature  changes,  etc  ...  ).  Up  to  now.  all  available  methods  are  purely  numeric. 

In  our  own  research  group,  all  symbolic  learning  for  scene  analysis  that  has  been  done,  either  starts 
from  an  already  known  symbolic  description  [Kodratoff  St  Lemerie-Loisei  1984]  or  from  supposedly 
aoise-fiee  pixel  descriptions  [Cannai  St  Kodratoff  1986],  [Cannat  k  at  1986]. 

In  order  to  fill  up  the  gap  between  real  images  and  ideal  ones,  we  are  presently  using  a  numeric 
method  due  to  [Mokhtarian  St  Mackworrh  1986]  wtdch  seems  very  promising. 

This  shows  that  we  have  followed  a  quite  classical  pattern  :  start  from  real  pixel  images,  use  numencal 
techniques  to  get  a  noise-free  description  in  terms  of  high  level  descriptors,  use  then  symbolic  methods 
for  interpreting  these  descriptors. 

This  approach  to  Scene  Analysis  seems  to  us  justified  if.  and  ooly  if.  one  is  supposed  to  simulate  a 
system  entering  a  brand-new  domain,  and  forced  to  discover  ail  forms  each  tune  it  sees  them.  Tbe 
methods  issued  from  this  approach  should  then  be  domain  independant 

2.1.1  -  Domain  Dependant  Scene  Analysis 

On  the  contrary,  when  one  is  wodang  in  a  specific  domain,  one  is  always,  implicitly  or  explicitly  ( 
our  point  is  :  too  often  implicitly  )  using  high  level  knowledge  relative  to  tbe  domain.  For  instance,  if 
some  kind  of  curves  are  likely  to  appear,  one  will  develop  a  special  pixel-to-descnptor  method  to  detect 
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tbem.  Besides,  ooe  will  introduce  special  descriptors  that  will  take  into  account  some  subtle  differences 
among  those  curves,  that  would  be  otherwise  confused. 

This  is  what  we  call  "hidden  use  of  symbolic  methods". 

As  an  example  of  this  undesirable  feature,  we  shall  self-cridcize  and  cite  (Kodratoff  <4  Lemerle-Loisel 
1 984 1  where,  for  instance,  the  spatial  relationships  TQTHERIGHT  among  forms  (  that  are  represented 
by  circles  >  are  described  as  follows. 

We  define  the  horizontal  strip  associated  to  a  circle  <  respectively,  its  vertical  strip  )  by  the  portion 
of  the  plane  which  is  between  two  horizontal  (  resp.  vertical  )  parallels  tangent  to  the  circle. 

We  then  define  6  different  sons  of  operators  describing  different  ways  to  express  that  a  circle  stands 
"to  the  nght  of'  an  other  circle. 

For  instance.  TOTHERICHTglA.  3)  says  that  circle  A  is  actually  directly  above  circle  B.  t.e.  that  the 
center  of  A  is  inside  the  vertical  strip  of  3.  TOTHER1GHT ,(A.  3)  says  that  the  center  of  A  is  inside  the 
horizontal  strip  of  3.  TOTHERlGHTfiA,  B\  says  that  the  center  of  A  is  outside  both  the  horizontal  and 
■  ertical  strip  of  B.  etc  ... 
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By  defining  a  strip  and  using  it  in  the  operator  definitions,  we  <  in  a  hidden  symbolic  way  )  handle  the 
noise  relative  to  the  posinon  of  circle  A  when  its  center  is  approximately  on  a  vertical  I  resp.  horizontal 
)  line,  since  the  precision  with  which  our  operator  is  defined  includes  the  width  of  B.  On  the  contrary, 
when  the  center  of  A  is  around  the  limit  of  the  strip,  it  becomes  then  extremely  noise  sensitive  since  the 
least  difference  may  make  decide  that  it  is  inside  or  outside  the  strip. 

This  describes  a  partial  handling  of  the  noise,  efficient  in  some  situations,  very  poor  in  others,  which  is 
quite  typical  of  hidden  symbolic  noise  handling,  and  gives  its  Limits. 

Besides  our  own.  most  papers  describing  a  specific  application  fall  as  well  into  the  same  trap. 

We  will  not  present  here  a  complete  sotunon  but  sunpiy  underline  that  it  can  be  of  two  different  lands. 

1  •  Classically,  one  can  introduce  rude  force  belief  coefficients  and  assign  a  belief  to  each  descriptor. 

2  -  As  recommended  here,  ooe  should  try  to  keep  available  as  much  as  possible  of  the  symbolic  infor¬ 
mation.  Here,  this  symbolic  lnformauon  can  be  represented  by  the  fact  that  some  operators  are  po 


lyraorptac  and  some  others  are  rxx. 


In  our  example.  TOTHERIGHT^A.  B )  and  TOTHERIGHTfA.  B)  are  polymorphic  since  the  center  of  A 
can  be  near  the  limit  of  the  strip  of  B.  in  which  case  the  slightest  error  may  make  switch  from  one  to 
the  other. 

TOTHERIGHTtylA.  B)  and  TOTHERIGHTfA.  B)  are  not  polymorphic  when  the  two  circle  do  not  inter¬ 
sect.  they  are  when  they  intersect. 

Polymorpby  would  then  be  a  better  way  to  treat  noise  than  numeric  coefficients  since  it  allows  to  Iceep 
mote  of  the  semantic  of  the  domain. 

In  section  2.4  below  we  describe  bow  polymorpby  can  even  be  used  to  retain  most  of  the  information 
provided  by  Mitchell's  version  spaces. 


'll  -  Learning  noise-resistant  recognition  functions 

Once  mote,  we  want  to  stress  the  point  that  coefficients  of  some  sort  are  not  the  exclusive  solunon  to 
noise  handling. 

In  this  section,  the  generalization  issued  from  the  examples  will  be  seen  as  a  recognition  function  of 
the  examples. 

When  noise  is  present  in  a  data  base,  it  often  introduces  some  contradicnons  in  it. 

We  shall  study  now  the  special  case  where  these  contradictions  are  actualized  by  the  fact  that  sets  of 
positive  and  negative  instances  are  not  disjoin.  This  case  has  been  smded  by  [Fu  <4  Buchanan  1985]. 
We  present  here  a  solunon.  first  given  in  [Kodratoff  A  ai  1986],  which  is  completely  different  from  the 
one  given  by  [Ftr  A  Buchanan  1985]. 

Suppose  (hat  oae  starts  from  a  set  of  examples  (E)  »  (£,,  £„)  and  counter-examples  (CE)  *  ) C£,. 

-  CEm). 

Let  G  be  a  conjunctive  generalization  of  |E)  and  let  us  suppose  that  some  of  the  counter-examples, 
say  the  sub-set  ( CE' },  are  recognized  by  G. 

We  shall  use  the  following  example,  inspired  by  plant  pathology  rules  ( Kodratoff  A  al  1986J. 

Let  the  positive  examples  be  : 

£,  :  (COLOR  -  RED)  A  (SIZE  -  VERY-BIG)  A  (TEXTURE  -  SOFT) 

E-, :  ( COLOR  -  GREEN)  A  (SIZE  -  BIG,  A  , TEXTURE  -  HARD) 

£,  :  (COLOR  -  GREEN)  A  tSIZE  -  VERY-BIG )  A  (TEXTURE  -  HARD) 

and  let  a  counter-example  be  : 

C£,  :  (COLOR  -  RED)  A  i SIZE  -  BIG)  A  t TEXTURE  -  HARD) 

Supposing  that  we  know  that  BIG  and  VERY -BIG  can  generalize  to  LARGE,  a  conjunctive  generaliza¬ 
tion  of  ( £|.  £j.  £3 1  is 

g  :  (Color’  -  any )  a  tsrzE  -  large)  a  i texture  -  any) 

This  generalization  is  also  a  recognition  function  when  one  says  that  1 1  recognizes  its  instances. 
Unfortunately,  it  recognizes  also  C£,  which  is  one  of  its  instances. 

W«  suggest  to  treat  this  noise  effect  by  pamnoamng  ( £,.  £, )  in  two. 

Let  (E'l  =»  |£, .  E,\  and  IE"]  »  |£^(.  ....  £,]  be  two  disjoint  subsets  of  |E|.  Let  G’  and  G"  be 

generalizauoos  of  |E’]  and  (E"]  such  that  G'  v  G"  does  not  recognize  |CE'|. 


-  ■  v’VT.TV* . 
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z>f  {£" )  -  (£,.  £,)  <md  (£-}  -  £,. 

G'  -  (COLOR  -  ANY)  4  (SIZE  -  VERY -BIG)  4  (TEXTURE  -  ANY) 

G”  -  £2. 

The  disjunction  G’  v  G”  does  not  recognize  CE. 

Therefore,  we  have  solved  our  initial  problem  of  recognition  of  counter-examples  by  a  generalization 
of  the  examples. 

This  solution,  which  seems  to  be  a  little  artificial,  allows  also  to  favor  noise- resistant  generalizations. 
This  follows  from  the  fact  that  there  are  usually  many  different  partmons  of  |E)  that  will  generate  dis¬ 
junctive  generalizations  rejecting  the  counter-examples. 

Let  (£")  «  {£..  £j)  and  {£”}  »  £t.  This  partition  generates  an  other  disjunctive  generalization  chat 
rejects  CE.  namely  : 

GG"  -  (COLOR  -  GREEN)  4  (SIZE  -  LARGE )  4  (TEXTURE  -  HARD ) 

GG”  -£,. 

and  GG'  v  GG"  rejects  CE. 

On  the  contrary,  the  last  partition.  (E )  -  (£,,  £,}  and  (£")  •  £s  generates  the  generalization  G  v 
£j,  which  of  course  recognizes  CE. 

Let  lGt'. _ Gt  1  be  the  set  of  such  disjunctive  generalizations. 

In  our  example.  G\  m  G*  v  GT ,  GV  »  GG*  v  GG*. 

Some  of  the  descriptors,  out  of  which  the  generalizations  are  made,  may  be  more  or  less  noise- 
reslstant. 

One  will  choose  in  this  set  the  generalization  that  is  the  most  noise  resistant,  as  shown  by  the  two  fol¬ 
lowing  rules.  Both  of  them  rely  on  the  fact  that  one  can  discriminate  the  noise  resistance  of  the  descrip¬ 
tors. 

In  our  example,  we  consider  only  the  noise  issued  from  descriptor  polymorpky.  and  follow  everyday  in¬ 
tuition.  Let  us  accept  that  there  is  no  polymorpky  between  RED  and  GREEN,  some  between  HARD  and 
SOFT,  and  much  between  BIG  and  VERY-BIG.  It  follows  that  we  consider  that  the  colors  are  not  noisy, 
the  texture  somewhat  noisy,  and  the  sizes  very  noisy. 

Rule- 1  (  purely  symbolic  ). 

For  each  G,'.  consider  the  set  of  descriptors  that  discriminate  (E)  against  (CE).  They  are  the  important 
descriptors  that  contain  the  features  typical  to  (El,  and  atypical  to  (CE). 

Rule- 1  is  then  :  choose  the  G{  whose  discriminant  descriptors  are  the  most  noise-resistant. 

In  G(  :  KG'  -  (COLOR  -  ANY)  4  (SIZE  -  VERY-BIG)  4  < TEXTURE  -  ANY))  v  (G~  -  Ejj.  G ■ 
rejects  CE  because  of  the  descriptor  SIZE,  and  G”  rejects  CE  because  of  the  descriptor  COLOR. 

In  G-,'  ((GG'  -  (COLOR  -  GREEN)  4  /SIZE  -  LARGE)  4  (TEXTURE  -  HARD))  v  (CRT  -  £,)/. 
GG'  rejects  CE  because  of  the  descriptor  COLOR  and  GCT  rejects  CE  because  of  the  descriptors  SIZE 
and  TEXTURE 

It  follows  that  G{  uses  more  noise-resistant  descriptors,  since  the  SIZE  used  in  GG”  is  helped  by  a 
supplementary  difference  in  TEXTURE. 

Rule- 1  would  lead  us  to  choose  Gf  as  correct,  noise-resistant,  disjunctive  generalization. 

Rule-2  (  purely  numeric  ). 

It  may  happen  that  Rule- 1  is  not  operative  because  the  disjunctive  generalizations  use  the  same 


-  19- 


desaiptots. 

Imagine,  in  our  example,  that  TEXTURE  does  not  appear  in  Gf. 

Let  G'  and  G'  be  two  disjunctive  generalizations  that  cannot  be  ordered  by  Ruie-1. 

As  we  have  already  seen  in  our  example,  different  discriminating  descriptors  are  issued  from  die 
different  clusters  where  the  generalisations  come  from. 

Call  ({£,'},  (£,"}}  the  partition  of  |E(  which  generates  G'  and  call  ((£/},  {£")!  the  partition  of  IE) 
winch  generates  Gf. 

Let  us  call  P*  the  descriptor,  common  to  (£,')  and  {£/'),  that  discnrmnates  (E)  from  (CE),  and  call 
P"  the  descriptor,  common  to  (£,")  and  (£/),  that  discrirmnates  (E)  from  (CE). 

For  the  sake  of  clarity,  suppose  further  that  P’  is  less  noisy  than  P”.  and  that 
{£,')  and  (£/'}  con  rain  mote  elements  than  (£,"]  and  (£/}. 

Then  ?’  is  less  noisy  than  P",  and  it  is  issued  from  a  statistically  mote  significant  subset  of  examples. 

Rule-2  is  :  whenever  possible,  choose  the  disjunctive  generalisation  chat  makes  use  of  discriminant 
descriptors  that  are  both  the  most  statistically  significant  and  the  less  noise-sensitive. 

Imagine,  in  our  example,  that  TEXTURE  does  not  appear  in  GV.  the  two  disjunctive  generalizations 
would  then  use  the  same  set  of  descriptors. 

In  Gf.  the  more  noisy  descriptor.  SIZE,  is  the  one  which  is  issued  from  £,  and  E}.  It  does  not  jit  the 
conditions  of  Rule-2. 

In  Gj',  the  less  noisy  descriptor.  COLOR,  is  issued  from  £j  and  E\.  This  gives  us  a  second  reason  for 
choosing  G>  as  noise-resistant  disjunctive  generalization. 

The  above  technique  allows  to  combine  aumenc  data  about  the  number  of  examples  covered  by  a 
description,  mimetic  or  symbolic  data  about  the  noise  associated  to  each  descriptor,  and  symbolic  data 
about  disjunctive  generalizations. 


20  -  Learning  noise-resistant  strategies 

When  a  sequence  of  commutative  operators  has  to  be  applied  to  achieve  a  goat  it  is  a  matter  of  stra¬ 
tegy  to  decide  in  which  order  the  operators  must  be  applied. 

Similarly,  the  classical  "condict  resolution'’  of  the  System  Experts  is  nothing  but  a  strategical  choice 
about  what  is  the  operator  to  apply  next,  when  several  ate  available. 

This  problem  can  be  illustrated  by  the  following  example,  taken  from  [Bisseret  <4  Girard  1973],  It 
simulates  the  controls  necessary  when  two  planes  are  exiting  an  air-traffic  sector.  There  are  two 
conflicting  flights,  and  the  problem  is  to  find  which  must  change  its  flight  profile. 

One  can  ask  as  first  question  wether  Flight-1  must  be  lowered  to  exit  sector.  A  pan  of  the  decision 
tree  is  then  : 


rVJTL1  VWJ 
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Flight- 1  lowered  to  exit  sector ? 


Flight-2 


Flight-1  changes  profile 


steady  and  level? 


Flight-2  able  to  become 
steady  and  level ? 


Only  Flight-2  steady  and  level  . 


Flight-l  changes  profile 


Flight-2  changes  profile  Flight-1  changes  profile 


One  can  also  ask  first  wether  Flight-2  is  steady  and  level.  A  part  of  the  corresponding  decision  tree  is 
then  the  following. 


Flight- 2  steady  and  level ? 


y« . 


Flight-l  towered  to  exit  sector?  Flight-1  steady  and  level? 

yey/"  yes^ 

Flight-l  changes  profile  Flight-l  steady  and  level?  Flight-l  towered  to  exit  sector ? 


The  choice  between  these  two  strategies,  so  claim  the  specialists  tn  air-traffic  control,  is  not  due  to  any 
noise  (  and  so  we  hope  !  ).  but  to  an  estimation  of  the  complexity  of  the  exact  calculations  in  each 
case. 

Form  our  AI  point  of  view,  noise  and  calculation  complexity  can  well  be  confused. 

As  the  above  example  shows,  depending  of  the  compuianon  complexity,  or  noise,  relative  to  the 
answer  to  Flight-2  steady  and  level?,  it  is  wise  or  not  to  ask  it  as  first  quesuoa. 

More  generally,  it  is  quite  evident  that  strategies  should  be  adapted  to  the  noise  of  the  descriptors  tbey 
use.  The  less  noisy  descriptors  should  be  used  as  early  as  possible,  and  the  most  noisy  ones,  as  late  as 
possible. 

Before  discussing  some  solutions  to  the  problem  of  the  obtainmem  of  these  strategies.  let  us  first  point 
out  that,  again,  a  ’pure  ’  symbolic  problem,  viz.  the  obtainemem  of  variable  strategies,  is  one  of  the 
solutions  to  noise  handling. 

How  to  obtain  sets  of  strategies? 


1  -  From  Human  Experts. 
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The  above  straiegies  for  flights  exiting  a  sector  was  directly  given  by  the  expen. 

More  generally,  our  experience  shows  that  human  experts  quite  dislike  providing  rules,  which  they  are 
almost  exclusively  asked,  and  just  love  providing  strategies,  which  they  are  not  asked 
This  is  why  we  have  devised  a  system  called  DISCIPLE  [Kodratoff  &  Tecuci  1986a.  1986b]  which  is 
oriented  towards  the  learning  of  strategies  on  the  conditions  in  which  rules  must  be  applied. 

In  this  system,  the  rales  must  be  given,  at  least  in  an  instantiated  form,  and  conditions  for  their  appli¬ 
cations  are  learned  through  a  conversational  interaction  with  an  expert. 

The  system  "guesses'  the  condinon  of  application  of  a  rule.  then,  from  its  data  basis,  it  applies  the 
guessed  condition  to  its  knowledge.  It  therefore  proposes  instanaated  rules  to  the  expert  When  the  ex¬ 
pert  accepts  them,  the  coodlnons  of  application  are  confirmed  and  accordingly  generalized,  when  the 
expert  rejects  them,  the  condinon  of  application  are  accordingly  particularized. 

2  -  Automatic  generation  of  strategies 

When  ooe  generates  automancaliy  recognition  functions,  they  can  be.  as  done  by  [Michalski  &  Chi- 
lansia  1980},  [Michalski  &  aL  1982]  used  as  coodiuons  for  rale  application. 

In  these  references,  it  is  very  clear  that  large  rales,  concluding  to  an  acnon  horn  a  large  set  of  condi¬ 
tions.  are  looked  for.  It  could  be  very  useful  to  look  for  intermediary  dusters  of  examples  and 
counter-examples  that  could  provide  intermediary  rales,  as  for  instance  done  by  [Fu  3c.  Buchanan  1985]. 
In  this  way.  which  merges  automatic  generation  of  recognition  functions  and  conceptual  clustering,  it 
could  be  quite  possible  to  generate  automancaliy  sets  of  possible  strategies,  that  coukl  be  used  to  be 
adapted  to  the  noise  condinon  in  each  particular  application. 


2.4  -  Polymorphic  Version  Space 


The  notion  of  Version  Space  has  been  introduced  by  T.  Mitchell  ("Mitchell  1982]  who  describes  it  as  a 
set  of  possible  generalizanoo  states.  Let  us  recall  briefly  Mitchell’s  results. 

One  generalizes  from  the  examples,  and  the  subset  of  maximally  specific  generalizations"  obtained  by 
genernlizanon  from  the  examples  is  called  the  S-set.  In  this  paper,  we  shall  use  the  extension  of 
Mitchell's  ideas  due  to  Utgoff  [Utgoff  1986],  and  suppose  that  intermediary  concepts  (  called  "bias'  by 
Utgoff  )  are  always  available. 

One  particularizes  from  counter-examples,  and  the  subset  of  "maximally  general  genetalizanons"  ob¬ 
tained  from  the  counter -exemples  is  called  the  G-set. 

tn  order  to  illustrate  it.  let  us  use  die  following  example,  from  [Mitchell,  Utgoff  <k  Banerf  1983  [. 


If  the  examples  are  instances  of  SIN  and  COS.  then  the  S-set  is  made  of  all  the  sons  of  TRIG  by  fol- 
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lowing  Mitchell,  while  Utgoff  allows  us  to  suppose  that  there  is  always  some  intermediary  concept  t  let 
us  call  it  here  :  SIN  v  COS  )  that  makes  the  S-set  nearer  to  the  examples. 

If  the  counter-examples  are  instances  of  POLY,  then  the  G-set  is  made  of  the  son;  of  TRANS. 

A  first  order  logic  presentation  of  Mitchell's  ideas  will  allow  us  to  discuss  idea  generalization  to  noisy 
data 

Let  P  be  the  first  order  logic  predicates  that  expresses  the  success  of  acaon  done  ts  the  situation  A.. 

In  the  above  example,  imagine  that  one  is  concerned  with  symbolic  integration,  and  that  one  disposes 
of  a  set  of  possible  operators,  among  them,  an  operator  of  integration  by  parts  : 


OP i  :  |  u dv  =  uv  -  j  vdu. 

Then  A,  -  [Functional  part  of  dv  ■  SIN] .  and  P  »  [Success  of  Integration  by  Parts  by  applying  OP-J. 
For  the  sake  of  brevity,  we  leave  impiiat  m  the  rest  this  section  that  the  Integration  by  Parts  is  always 
done  by  using  OP-. 

Let  (A,  1  be  the  set  of  the  siraanoos  that  insure  success  during  the  training  phase,  then  each  A,  is  such 
that  A,  =*»  P.  Therefore,  each  A,  is  a  sufficient  condinon  for  the  validity  of  P. 

The  S-set  (  wufa  the  bias  extension  )  is  therefore  a  set  of  sufficient  condinons  for  a  success. 

Functional  part  of  dv  »  SIN  is  a  sufficient  condinon  for  the  integration  by  part  to  succeed.  Funcnonal 
part  of  dv  -  COS  also. 


Let  I  A,  I  be  the  set  of  the  actions  that  insure  failure  during  the  training  phase,  and  let  |CA,|  the  com¬ 
plementary  set  to  (A,  (  The  set  i  CA.  |  is  the  G-set  for  the  value  of  u  m  the  Integration  by  Parts. 

Otherwise  stated,  given  (A,),  one  tries  to  find  an  other  subset  |CA,)  such  that,  for  each  i. 

A;  OW  — CA, 

Since  each  A,  is  a  failure,  it  is  such  that 

a'  =»  -P. 


It  trivially  follows  that 

Therefore,  each  CA,  is  a  accessary  condition 
sary  condinons  for  a  success. 


P  =o  CA, 

for  the  validity  of  P  The  G-set  is  therefore  a  set  of  neces- 


Consider  the  above  hierarchy  Since  POLY  and  TR.ANS  are  two  aitferent  sons  of  the  same  father,  one 
knows  that,  in  weil-behaved  taxonomies,  -hev  exclude  eacn  outer,  i.e.  that  POLY  <s=>  —TRANS. 


Since  POLY  is  a  counter-example  to  the  success  at'  the  integration  bv  pans  with  Funcnonal  part  of  dv 
•  POLY,  the  G-set  of  the  integration  bv  part  is  Functional  part  of  dv  »  TRANS  and  its  sons. 


In  greater  details,  one  can  see  that 

[Funcnonal  part  of  dv  »  POLY]  =»  —(success  of  Integration  by  Parts f. 


Because  of  the  taxonomy,  one  has 
Using  the  classical  fact  that 

A 


POLY  -TRANS. 

B  a  equivalent  to  —A  v  3. 


one  has 


(Functional  part  of  dv  -  TR.ANS]  v  -[success  of  Integration  by  Pans! 
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which  is  equivalent  to 

[ Functional  part  of  dv  »  TRANS]  v  -•(success  of  Integration  by  Pans  j 
and.  therefore,  to 

(success  of  Integration  by  Pans!  =>  ( Functional  pan  of  dv  »  TRANS], 

The  interest  of  tins  small  theorenzanon  lies  in  the  fact  that  it  gives  the  main  rwo  hypothesis  under 
winch  Version  Spaces  axe  tractable. 

On  the  one  hand,  one  must  have  —A  =  A.  i.e..  one  must  not  use  lnnnnomsnc  logics.  This  restncnoo 
is  not  so  strong  m  practice. 

Nevertheless,  the  following  example  will  show  that  one  must  be  careful  while  using  negation  in  a  rea¬ 
soning  step. 

Suppose  that  one  is  working  with  red.  green,  and  blue  colors. 

Let  r  be  a  particular  red  color.  Then  a  possible  negation  of  this  red  color,  —r,  may  be  a  particular 
green  color,  sav  g.  Now,  possible  negations  of  g  are  of  course  r.  but  also  any  other  color  which  not 
green,  for  example  a  particular  blue.  b. 

In  that  case.  — - •r  may  rake  the  value  b  instead  of  r  as  one  could  expect. 

One  has  to  suppose  that  a  special  care  is  taken  for  racking  the  ongin  of  the  negations  when  double 
negation  is  applied,  in  order  to  insure  the  validity  of  classical  logics. 

On  the  other  hand,  in  order  to  build  the  G-set.  one  must  nod  counter-examples.  A-,  that  exclude  some 
other  predicates.  i.e.  such  that  A.  <*>  — CA,. 

In  this  case,  concept  potvmorpby,  l&.  the  fact  that  concepts  are  not  always  disjoint,  will  prevent  an 
easy  building  of  the  G-set. 

We  shall  illustrate  now  our  claims  with  concepts  of  colors,  which  clearly  are  partially  polymorphic. 

For  instance,  red  and  rose  are  polymorphic,  because  some  of  iheir  instances  may  be  confused,  but  red 
and  green  are  not. 

The  colors  are  then  not  represented  by  a  unique  point  in  a  state  space,  but  by  a  set  of  the  possible 
instances  of  each  color. 

One  will  have  to  construct  a  taxonomic-like  nee.  similar  to  the  one  of  the  Version  Space.  Many  more 
links,  indicating  a  partial  polymorphy  will  have  to  be  added  to  the  taxonomy.  Coefficients  can  be  asso¬ 
ciated  to  each  link,  indicating  how  much  important  the  polymorphy  is.  We  insist  that  the  existence  of 
coefficients,  and  the  way  they  are  combined,  is  not  the  main  issue.  On  the  contrary,  the  main  issue  is  to 
keep  track  of  the  successive  steps  of  reasoning,  in  order  to  be  able  to  provide  explananon  to  the  user. 
This  last  point  has  been  explained  in  [Duval  &  tCodratotf  1986]  in  the  context  of  uncertain  reasoning. 

Consider  the  following  taxonomy  for  colors,  and  the  associated  links  of  polymorphy.  So  horizontal  link 
between  two  concepts  means  no  polymorphic  links. 

Strong  polymorphy  is  marked  by  a  mere  confusion  of  concepts  as  brown  below.  Medium  polymorphy  is 
marked  by  a  horizontal  line  of  -  .  Small  polymorphy  is  marked  by  a  horizontal  line  of  *  .  dk  stands  for 
dark.  Igt  for  light. 


primary 


secondary 


i<-/ — 


Wue  < - >  red 

,  »  l< - -/.(<» 

\,< - Z~.\ 


yellow  < - >  orange 

- 


dk  <***>  Igt  dk  <***>  Igt  Igc 


?r  dk 

(brown) 

/  \ 

Igr  <***■:>  (it 

(chestnut)  (brown) 


Igt  dk  <***>  Igt  dk  <***>  /gr 


•"  ' 


/f  it  generally  understood  that,  for  instance,  primary  and  secondary  colors  cannot  be  confused,  i.e.  that 
primary  <s=s>  -secondary.  77itr  ij  why  there  is  no  horizontal  line  between  primary  and  secondary. 
Nevertheless,  potymorphy  of  their  sons  will  induce  some  (  implicit  )  polymorphy  between  them. 

When  a  counter-example  is  given,  one  will  have  to  check  winch  predicates  can  be  truly  considered  as 
behaving  classically,  i.e.  they  have  no  polymorphy  with  the  counter-example  predicate.  One  will  have 
also  to  keep  track  of  partial  polymorphy  that  tells  that  they  are  partially  only  rejected  by  the  counter¬ 
example. 

Suppose  that  the  above  taxonomy  is  used  to  allow  or  reject  some  action  P.  and  that  the  color  light  red 
is  a  counter-example  :  light  red  =>  —P 

The  construction  of  the  G-set  will  proceed  as  follows. 

Let  us  first  suppose  that  "first  generation"  polymorphy  only  is  considered. 

Light  red  is  a  primary  color,  polymorphic  with  some  secondary  colors,  namely  purple  light  and  orange 
light.  It  follows  that  the  G-set  contains  all  secondary  colors,  except  these  two.  as  in  the  figure  below. 


r, 


blue  re ^  yeiloy/'^orange  green 

/\  /\  y^v/>  /\ 

dk  Igt  dk  Igt  Igt/  dk'-.^/  Igb  dk  Igt  [dk  \^l\r 
/  (brown)  l  >  \ 

/  7  \ '  y 

igt  dk  \  , 

j  (chestnut)  (brown)\  j 

Let  us  now  take  also  into  account  the  fact  that  "second  generation  polymorphy "  can  also  be  important. 
Light  red  is  also  feebly  polymorphic  with  dark  red  which ,  in  cum.  is  polymorphic  with  dark  purple  and 
dark  orange. 

Therefore,  dark  purple  and  dark  orange  must  be  also  excluded  of  the  G-set.  but  with  much  less 
strength  than  their  light  counterparts. 
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Now.  and  only  now.  some  way  of  combining  coefficients  will  be  necessary  to  ha\  e  a  correct  modelling 
of  the  necessary  knowledge. 

Ia  Mitchell’s  Version  Space,  the  G-set  is  a  ooe -dimensional  entity  since  ooe  point,  the  node  where  it 
starts,  is  accessary. 

If  no  combination  of  polymorpfay  is  allowed,  a  two  dimensional  G-set  must  be  used,  in  order  to  tell 
which  predicates  are  excluded  from  it.  as  seen  in  the  above  figure. 

If  one  allows  combinations  of  polymorphy  coefficients,  a  three  dimensional  G-set  becomes  necessary. 
The  third  dimension  tells  the  intensity  with  which  the  predicates  belong  to  the  G-seL 

The  same  land  of  work  can  be  done  for  the  the  S-set.  Ooe  will  obtain  in  a  similar  way  a  three  dimen¬ 
sional  Version  Space. 

One  knows  that  the  S-set  and  the  G-set  must  coincide  in  order  to  obtain  hat  we  call  here  :  necessary 
and  sufficient  conditions  for  the  application  of  the  rules. 

In  the  case  of  polymorphic  Version  Space,  the  same  is  true,  but  the  coincidence  can  be  approximate 
only,  and  must  hold  between  complicated  shapes. 

In  general,  the  G-set  and  the  S-set  will  not  coincide  but  simply  interject.  In  most  cases,  one  will  have 
even  nothing  but  information  about  the  likeliness  of  this  intersection. 

Therefore,  the  global  inform  anon  about  noise  or  polymorphy  will  not  be  totally  contained  in  belief 
coefficients  only. 

Numeric  coefficients  are  of  course  necessary  to  convey  the  inform  an  on  about  (be  likelmess  of  the 
intersection,  but  ooe  must  be  aware  that  it  would  be  wrong  to  forget  the  essential  information  conveyed 
by  our  extension  of  the  Version  Spaces,  which  can  be  described  as  follows. 

Let  Pl  be  a  predicate  belonging  to  the  G-set  only,  P>  be  a  predicate  beioagmg  to  the  S-set  only,  and 
[/*,,  P-Ji  be  the  set  of  predicates  that  are  sons  of  and  fathers  of  Pz. 

Then,  the  exam  generalization  state  is  unknown,  but  belongs  to  [/*.,  P{\. 

This  last  sentence  is  a  way  of  describing  uncertainty  by  a  purely  symbolic  method  which  could  never 
have  been  imagined  without  Mitchell’s  noise-free  Version  Spaces. 


CONCLUSION 


In  a  recent  paper,  we  claim  that  Ai  is  not  a  sub-deld  of  Computer  Science,  but  a  new  Science  by  itself 
[Kodratoff  1986b],  independam  from  its  parents  Mathematics,  Logics,  and  of  course  Computer  Science. 

We  shall  simply  recall  here  our  mam  argument  Ai  has  its  self  weil-idenohed  acid  of  research, 
namely  the  derimnon.  measurement  and  applications  of  explanations  given  ua  the  own  language  of  us 
user.  In  other  words,  while  all  other  Sciences  provide  explananoas  in  their  own  language  (  very  often 
they  are  even  able  to  become  rather  esoteric  !  ).  the  topic  of  Ai  is  to  teach  a  point  where  it  can  provide 
explananoas  in  the  own  language  of  the  user  of  Ad. 

We  do  not  want  to  argue  this  point  here,  but  would  rather  cry  to  show  ctm  the  rest  of  this  paper,  in  a 
perhaps  indirect  way.  tnes  to  help  achieving  this  goal.  Even  if  the  reader  disagree  with  our  posmoo  that 
Ai  is  the  science  of  explananoas.  be  can  soil  discuss  our  point  that  a  better  denmnon  of  generalizanoo 
(  section  1  of  this  paper  )  and  a  systematic  use  of  symbolic  techniques  (  seen  on  2  of  this  paper  }  are 
good  tools  to  achieve  a  better  explicative  ness  of  Ai  systems. 


About  section  1.  one  may  well  wonder  what  its  content  may  have  to  do  with  explicableness,  since  it 
looks  like  theoretical  discussions  about  a  formal  dednmoo  of  geoeralizanon.  Of  course,  one  can  argue 
in  a  very  abstract  way  that  better  definitions  always  lead  to  better  unde  rst  an  liability.  In  the  case  of  gen¬ 
eralization.  it  is  by  itself  a  kind  of  explananon  of  why  are  the  examples  similar  Refining  generalization 
may  help  to  remember  some  bidden  common  feature  which  can  be  an  explananon.  Counter-examples 
will  be  necessary  to  deade  what  is  and  what  is  not  an  explananon. 

Recall  the  examples  of  section  1.7,  £,  «  MAMMALIANl A )  St  BRED_AN1MALIA).  £,  »  TAME(B)  Sc 
VIVIPAROUS!  Bl.  from  which  we  could  find  the  generalization  G  »  T.AMEfx/  Sc  MAMMALIANl x>. 
Suppose  now  that  a  counter-example  to  G  is  CE  »  DANGEROUS! UON-l).  From  it,  we  can  now  tell 
that  eventhough  (  implicitly  )  present  in  both  examples,  MAMMALIAN  and  TAME  are  not  the  good 
explananon  of  the  link  between  the  examples.  One  has  to  use  R%:  Vr  [TAMElx )  =>  HARMLESStx)].  and 
some  knowledge  of  the  kind  Vx  (DANGEROUS(x)  =e>  —NARMLESSlxt],  to  be  able  to  explain  that  this 
examples  are  about  harmless  animals.  Without  introducing  TAME  in  £,.  by  the  use  of  R{.  one  would 
nave  been  unable  to  find  this  explanation. 

Our  refinements  to  generalization  are  not  explanatory  by  themselves,  but  they  may  allow  to  start  expla¬ 
natory  processes. 

About  section  2,  its  content  is  much  more  evidently  linked  to  explicableness.  Symbolic  techniques 
keep  the  land  of  information  that  provides  explanations  while  numeric  ones  (  and  especially  coeifiaem 
combinations  )  do  noc 

As  an  example,  consider  the  LEX  system  which  is  capable  of  carrying  out  formal  integrations 
[Mitchell.  Utgojf  Sc  Banerji  19831.  As  seen  in  section  2  A.  the  learning  part  of  LEX  as  been  trying  to 
make  identical  the  G-set  and  the  S-set  of  '  u  '  and  dv  ’  m  integration  by  parts.  Suppose  chat  it  suc¬ 
ceeded  by  finding  that  these  common  G-and-S-sets  are  '  polynomial  ’  for  '  u  '  and  ‘  trigonometric  ’  for 
‘  dv  '. 

Suppose  now  the  system  is  asked  to  integrate  3x  cos  x  dx  and  that  it  chooses  to  integrate  by  parts 
with  u  a  3x  and  dv  a  cos  x  dx. 

It  is,  at  least  in  principle,  capable  of  explanations  in  the  sense  that,  it  is  capable,  when  asked  the  ques¬ 
tion:  " Why  have  you  chosen  this  way  of  integrating?'',  of  giving  the  answer:  'Because  I  had  the  option 
of  choosing  a  '  u  which  is  a  polynomial  and  a  ’  dv  whose  functional  part  is  a  trigonometric  func¬ 
tion. 

The  symbolic  handling  of  knowledge  about  necessary  and  sufficient  conditions  makes  possible  tins 
land  of  explananon. 

At  a  kind  of  counter-example,  imagine  that  it  could  be  quite  possible  to  achieve  also  very  good  results 
in  symbolic  integration  by  asserting  coefficients  to  the  possible  '  u  '  and  '  dv  ’  in  integration  by  pan, 
and  learning  by  increasing  the  coefficents  in  case  of  success,  and  decreasing  them  in  case  of  failure.  No 
explanations  can  been  given  from  this  land  of  learning. 
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